NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulos, C; Bellas, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM)

Full Text Available
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Full Text Available
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Full Text Available
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Niu, W; Sanim, M; Shu, Z; Guan, J; Shen, X; Yin, M; Agrawal, G; Ren, B (April 2024, acm)

Full Text Available
Engineering Human Mesenchymal Bodies in a Novel 3D-Printed Microchannel Bioreactor for Extracellular Vesicle Biogenesis

eske, R.; Chen, X.; Mulderrig, L.; Liu, C.; Cheng, W.; Zeng, O. Z.; Zeng, C.; Guan, J.; Hallinan, D.; *Yuan, X.; et al (January 2022, Bioengineering)

Full Text Available
A fast X-ray transient from a weak relativistic jet associated with a type Ic-BL supernova

https://doi.org/10.1038/s41550-025-02571-1

Sun, H; Li, W-X; Liu, L-D; Gao, H; Wang, X-F; Yuan, W; Zhang, B; Filippenko, A V; Xu, D; An, T; et al (July 2025, Nature Astronomy)

Full Text Available
Maize resistance to witchweed through changes in strigolactone biosynthesis

https://doi.org/10.1126/science.abq4775

Li, C.; Dong, L.; Durairaj, J.; Guan, J.-C.; Yoshimura, M.; Quinodoz, P.; Horber, R.; Gaus, K.; Li, J.; Setotaw, Y. B.; et al (January 2023, Science)

Elucidation of maize strigolactone biosynthetic pathway has the potential for controlling the parasitic witchweed Striga .
more » « less
Full Text Available
Multi-messenger Observations of a Binary Neutron Star Merger

https://doi.org/10.3847/2041-8213/aa91c9

Abbott, B. P.; Abbott, R.; Abbott, T. D.; Acernese, F.; Ackley, K.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R. X.; Adya, V. B.; et al (October 2017, The Astrophysical Journal)

Full Text Available

Search for: All records